In many organizations throughout the world, both governmental and private, a major problem is one of handling documents for a variety of purposes. The documents are of current as well as historic interest and the documents may contain information printed or typewritten by machine, printed or written by hand or pictures, drawings and other forms of representation commonly referred to today as "graphics". It is very often necessary to access selected information for various purposes within a short time, and the information must be accessed from a large volume of such information in the form of the documents. Not all of the information contained in the documents may be of importance. In addition, that which is of interest may be of greater or lesser degrees of importance, depending upon the documents and upon the nature of the organization.
Much information, particularly that which is of historical interest only, is being filed in the form of microfilm, microfiche or similar forms which are produced by what can be generically described as photographic techniques. In other cases, the information contained in the documents is converted to an encoded form which can be accomplished by such machines as optical character readers (OCR), despite the considerable expense of such machines, depending on the nature of the printing or typing in the original document; but other information must be entered into a system by a manual keypunch operation, a technique which has gained wide acceptance because of the unsatisfactory nature of alternative techniques, but which nevertheless has serious drawbacks because of the inherent problem of errors occurring simply because of the human process of retyping the information. A discussion of various data preparation devices and techniques is to be found in the Encyclopedia of Computer Science and Engineering, Second Edition, Van Nostrand, Reinhold Company, New York (1983) beginning at page 480. This text includes a review of the historical development of data preparation and also discusses the expense and difficulty of recycling information within a system to reduce the error percentage.
In most circumstances, it is not necessarily desirable to eliminate human intervention, nor can this be done as a practical matter. For example, if documents coming into an organization are to be handled and entered into a system, it is necessary for some human operator to review each document, determine its relevance and make some decisions. It is, however, desirable to remove the human process of retyping or keypunching the data because of the above-discussed error entry problems. On the other hand, machine data entry preparation, such as OCR, in addition to the expense has the disadvantage that very often the total content of each document must be entered, an approach which is wasteful of mass storage, compounds the difficulty of locating and utilizing relevant information at a later time, and usually would necessitate reworking the data, depending upon its form and ultimate use.